WikiRank: Improving Keyphrase Extraction Based on Background Knowledge
نویسندگان
چکیده
Keyphrase is an efficient representation of the main idea of documents. While background knowledge can provide valuable information about documents, they are rarely incorporated in keyphrase extraction methods. In this paper, we propose WikiRank, an unsupervised method for keyphrase extraction based on the background knowledge from Wikipedia. Firstly, we construct a semantic graph for the document. Then we transform the keyphrase extraction problem into an optimization problem on the graph. Finally, we get the optimal keyphrase set to be the output. Our method obtains improvements over other state-of-art models by more than 2% in F1-score.
منابع مشابه
Evaluating anaphora and coreference resolution to improve automatic keyphrase extraction
In this paper we analyze the effectiveness of using linguistic knowledge from coreference and anaphora resolution for improving the performance for supervised keyphrase extraction. In order to verify the impact of these features, we define a baseline keyphrase extraction system and evaluate its performance on a standard dataset using different machine learning algorithms. Then, we consider new ...
متن کاملCorpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors
Keyphrase extraction from a given document is a difficult task that requires not only local statistical information but also extensive background knowledge. In this paper, we propose a graph-based ranking approach that uses information supplied by word embedding vectors as the background knowledge. We first introduce a weighting scheme that computes informativeness and phraseness scores of word...
متن کاملSGRank: Combining Statistical and Graphical Methods to Improve the State of the Art in Unsupervised Keyphrase Extraction
Keyphrase extraction is a fundamental technique in natural language processing. It enables documents to be mapped to a concise set of phrases that can be used for indexing, clustering, ontology building, auto-tagging and other information organization schemes. Two major families of unsupervised keyphrase extraction algorithms may be characterized as statistical and graph-based. We present a hyb...
متن کاملA Keyphrase Generation Technique Based upon Keyphrase Extraction and Reasoning on Loosely Structured Ontologies
Associating meaningful keyphrases to documents and web pages is an activity that can greatly increase the accuracy of Information Retrieval and Personalization systems, but the growing amount of text data available is too large for an extensive manual annotation. On the other hand, automatic keyphrase generation, a complex task involving Natural Language Processing and Knowledge Engineering, ca...
متن کاملImproving Keyphrase Extraction from Biomedical Documents Using Domain Specific Feature Set
Keyphrases enable the reader to quickly determine whether the given article is suitable for the reader’s digest. Keyphrases are also important for medical document retrieval and text mining research. Sometimes, the author-assigned Keyphrases or keywords available with the articles are too limited to represent the topical content of the articles. Many medical documents also do not come with auth...
متن کامل